AITopics | validation metric

Collaborating Authors

validation metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CogFormer: Learn All Your Models Once

Huang, Jerry M., Schumacher, Lukas, Stevenson, Niek, Radev, Stefan T.

arXiv.org Machine LearningMar-24-2026

Simulation-based inference (SBI) with neural networks has accelerated and transformed cognitive modeling workflows. SBI enables modelers to fit complex models that were previously difficult or impossible to estimate, while also allowing rapid estimation across large numbers of datasets. However, the utility of SBI for iterating over varying modeling assumptions remains limited: changing parameterizations, generative functions, priors, and design variables all necessitate model retraining and hence diminish the benefits of amortization. To address these issues, we pilot a meta-amortized framework for cognitive modeling which we nickname the CogFormer. Our framework trains a transformer-based architecture that remains valid across a combinatorial number of structurally similar models, allowing for changing data types, parameters, design matrices, and sample sizes. We present promising quantitative results across families of decision-making models for binary, multi-alternative, and continuous responses. Our evaluation suggests that CogFormer can accurately estimate parameters across model families with a minimal amortization offset, making it a potentially powerful engine that catalyzes cognitive modeling workflows.

artificial intelligence, machine learning, simulation of human behavior, (16 more...)

arXiv.org Machine Learning

2603.2052

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

21327ba33b3689e713cdff1641128004-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 18:55:54 GMT

radius sphere, unit radius sphere, validation metric, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Enabling hyperparameter optimization in sequential autoencoders for spiking neural data

Mohammad Reza Keshtkaran, Chethan Pandarinath

Neural Information Processing SystemsOct-2-2025, 22:17:36 GMT

We develop and test two potential solutions: an alternate validation method ("sample validation") and a novel regularization method ("coordinated

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Supplementary for: UCLID-Net: Single View Reconstruction in Object Space Anonymous Author(s) Affiliation Address email 1 Metrics 1

Neural Information Processing SystemsOct-2-2025, 10:51:43 GMT

This section defines the metrics and loss functions used in the main paper. The Earth Mover's Distance (EMD) is a distance that can be used to compare point clouds as well: We use the F-Score as a validation metric on the ShapeNet dataset. We introduce shell-Intersection over Union (sIoU). We use the sIoU as a validation metric on the ShapeNet dataset. We here present some details of the architecture and training procedure for UCLID-Net.

artificial intelligence, machine learning, radius sphere, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Wang, Wei, Wu, Dong-Dong, Li, Ming, Zhang, Jingxiong, Niu, Gang, Sugiyama, Masashi

arXiv.org Artificial IntelligenceSep-30-2025

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, the problem settings and solutions of PU learning have different families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2509.24228

Country:

North America (0.28)
Asia > Japan (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Rotationally Invariant Latent Distances for Uncertainty Estimation of Relaxed Energy Predictions by Graph Neural Network Potentials

Musielewicz, Joseph, Lan, Janice, Uyttendaele, Matt, Kitchin, John R.

arXiv.org Artificial IntelligenceJul-15-2024

Graph neural networks (GNNs) have been shown to be astonishingly capable models for molecular property prediction, particularly as surrogates for expensive density functional theory calculations of relaxed energy for novel material discovery. However, one limitation of GNNs in this context is the lack of useful uncertainty prediction methods, as this is critical to the material discovery pipeline. In this work, we show that uncertainty quantification for relaxed energy calculations is more complex than uncertainty quantification for other kinds of molecular property prediction, due to the effect that structure optimizations have on the error distribution. We propose that distribution-free techniques are more useful tools for assessing calibration, recalibrating, and developing uncertainty prediction methods for GNNs performing relaxed energy calculations. We also develop a relaxed energy task for evaluating uncertainty methods for equivariant GNNs, based on distribution-free recalibration and using the Open Catalyst Project dataset. We benchmark a set of popular uncertainty prediction methods on this task, and show that latent distance methods, with our novel improvements, are the most well-calibrated and economical approach for relaxed energy calculations. Finally, we demonstrate that our latent space distance method produces results which align with our expectations on a clustering example, and on specific equation of state and adsorbate coverage examples from outside the training dataset.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2407.10844

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Materials > Chemicals (0.35)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sea wave data reconstruction using micro-seismic measurements and machine learning methods

Iafolla, Lorenzo, Fiorenza, Emiliano, Chiappini, Massimo, Carmisciano, Cosmo, Iafolla, Valerio Antonio

arXiv.org Artificial IntelligenceJan-9-2024

Sea wave monitoring is key in many applications in oceanography such as the validation of weather and wave models. Conventional in situ solutions are based on moored buoys whose measurements are often recognized as a standard. However, being exposed to a harsh environment, they are not reliable, need frequent maintenance, and the datasets feature many gaps. To overcome the previous limitations, we propose a system including a buoy, a micro-seismic measuring station, and a machine learning algorithm. The working principle is based on measuring the micro-seismic signals generated by the sea waves. Thus, the machine learning algorithm will be trained to reconstruct the missing buoy data from the micro-seismic data. As the micro-seismic station can be installed indoor, it assures high reliability while the machine learning algorithm provides accurate reconstruction of the missing buoy data. In this work, we present the methods to process the data, develop and train the machine learning algorithm, and assess the reconstruction accuracy. As a case of study, we used experimental data collected in 2014 from the Northern Tyrrhenian Sea demonstrating that the data reconstruction can be done both for significant wave height and wave period. The proposed approach was inspired from Data Science, whose methods were the foundation for the new solutions presented in this work. For example, estimating the period of the sea waves, often not discussed in previous works, was relatively simple with machine learning. In conclusion, the experimental results demonstrated that the new system can overcome the reliability issues of the buoy keeping the same accuracy.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/fmars.2022.798167

2401.04431

Country:

Atlantic Ocean > Mediterranean Sea > Tyrrhenian Sea (0.24)
Europe > Italy (0.15)
North America (0.14)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deep Convolutional Neural Network for Plume Rise Measurements in Industrial Environments

Koushafar, Mohammad, Sohn, Gunho, Gordon, Mark

arXiv.org Artificial IntelligenceMar-12-2023

Estimating Plume Cloud (PC) height is essential for various applications, such as global climate models. Smokestack Plume Rise (PR) is the constant height at which the PC is carried downwind as its momentum dissipates and the PC and the ambient temperatures equalize. Although different parameterizations are used in most air-quality models to predict PR, they have yet to be verified thoroughly. This paper proposes a low-cost measurement technology to monitor smokestack PCs and make long-term, real-time measurements of PR. For this purpose, a two-stage method is developed based on Deep Convolutional Neural Networks (DCNNs). In the first stage, an improved Mask R-CNN, called Deep Plume Rise Network (DPRNet), is applied to recognize the PC. Here, image processing analyses and least squares, respectively, are used to detect PC boundaries and fit an asymptotic model into the boundaries centerline. The y-component coordinate of this model's critical point is considered PR. In the second stage, a geometric transformation phase converts image measurements into real-life ones. A wide range of images with different atmospheric conditions, including day, night, and cloudy/foggy, have been selected for the DPRNet training algorithm. Obtained results show that the proposed method outperforms widely-used networks in smoke border detection and recognition.

artificial intelligence, machine learning, recognition, (18 more...)

arXiv.org Artificial Intelligence

2302.07416

Country:

North America > United States (0.93)
North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring validation metrics for offline model-based optimisation

Beckham, Christopher, Piche, Alexandre, Vazquez, David, Pal, Christopher

arXiv.org Artificial IntelligenceFeb-4-2023

In offline model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of desirability through an expensive but real-world scoring process. Offline MBO tries to approximate this expensive scoring function and use that to evaluate generated designs, however evaluation is non-exact because one approximation is being evaluated with another. Instead, we ask ourselves: if we did have the real world scoring function at hand, what cheap-to-compute validation metrics would correlate best with this? Since the real-world scoring function is available for simulated MBO datasets, insights obtained from this can be transferred over to real-world offline MBO tasks where the real-world scoring function is expensive to compute. To address this, we propose a conceptual evaluation framework that is amenable to measuring extrapolation, and apply this to conditional denoising diffusion models. Empirically, we find that two validation metrics -- agreement and Frechet distance -- correlate quite well with the ground truth. When there is high variability in conditional generation, feedback is required in the form of an approximated version of the real-world scoring function. Furthermore, we find that generating high-scoring samples may require heavily weighting the generative model in favour of sample quality, potentially at the cost of sample diversity.

artificial intelligence, machine learning, validation metric, (12 more...)

arXiv.org Artificial Intelligence

2211.10747

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels

Ploshchik, Ilya, Chatzimparmpas, Angelos, Kerren, Andreas

arXiv.org Artificial IntelligenceJan-28-2023

Stacking (or stacked generalization) is an ensemble learning method with one main distinctiveness from the rest: even though several base models are trained on the original data set, their predictions are further used as input data for one or more metamodels arranged in at least one extra layer. Composing a stack of models can produce high-performance outcomes, but it usually involves a trial-and-error process. Therefore, our previously developed visual analytics system, StackGenVis, was mainly designed to assist users in choosing a set of top-performing and diverse models by measuring their predictive performance. However, it only employs a single logistic regression metamodel. In this paper, we investigate the impact of alternative metamodels on the performance of stacking ensembles using a novel visualization tool, called MetaStackVis. Our interactive tool helps users to visually explore different singular and pairs of metamodels according to their predictive probabilities and multiple validation metrics, as well as their ability to predict specific problematic data instances. MetaStackVis was evaluated with a usage scenario based on a medical data set and via expert interviews.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2212.03539

Country:

North America > United States > Wisconsin (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Asia (0.04)

Genre:

Research Report (0.90)
Personal > Interview (0.34)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback